Keepin’ It Real: Semi-Supervised Learning with Realistic Tuning

نویسندگان

  • Andrew B. Goldberg
  • Xiaojin Zhu
چکیده

We address two critical issues involved in applying semi-supervised learning (SSL) to a real-world task: parameter tuning and choosing which (if any) SSL algorithm is best suited for the task at hand. To gain a better understanding of these issues, we carry out a medium-scale empirical study comparing supervised learning (SL) to two popular SSL algorithms on eight natural language processing tasks under three performance metrics. We simulate how a practitioner would go about tackling a new problem, including parameter tuning using cross validation (CV). We show that, under such realistic conditions, each of the SSL algorithms can be worse than SL on some datasets. However, we also show that CV can select SL/SSL to achieve “agnostic SSL,” whose performance is almost always no worse than SL. While CV is often dismissed as unreliable for SSL due to the small amount of labeled data, we show that it is in fact effective for accuracy even when the labeled dataset size is as small as 10.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Detecting Concept Drift in Data Stream Using Semi-Supervised Classification

Data stream is a sequence of data generated from various information sources at a high speed and high volume. Classifying data streams faces the three challenges of unlimited length, online processing, and concept drift. In related research, to meet the challenge of unlimited stream length, commonly the stream is divided into fixed size windows or gradual forgetting is used. Concept drift refer...

متن کامل

Composite Kernel Optimization in Semi-Supervised Metric

Machine-learning solutions to classification, clustering and matching problems critically depend on the adopted metric, which in the past was selected heuristically. In the last decade, it has been demonstrated that an appropriate metric can be learnt from data, resulting in superior performance as compared with traditional metrics. This has recently stimulated a considerable interest in the to...

متن کامل

Swarm Intelligence in Semi-supervised Classification

This Paper represents a literature review of Swarm intelligence algorithm in the area of semi-supervised classification. There are many research papers for applying swarm intelligence algorithms in the area of machine learning. Some algorithms of SI are applied in the area of ML either solely or hybrid with other ML algorithms. SI algorithms are also used for tuning parameters of ML algorithm, ...

متن کامل

Deep Learning Neural Network with Semi supervised Segmentation for Predicting Retinal and Cancer Cell Diseased

In medical field, diagnosis of diseases competently carried out by using the image processing. So that to retrieve the relevant data from the amalgamation of resulting image is too difficult. Here the segmentation done by semi supervised learning then the result is tuned by using Deep Learning Neural Network. Higher tuning of results will leads to efficient detection of disease. The experiment ...

متن کامل

Accelerating Eulerian Fluid Simulation With Convolutional Networks

Real-time simulation of fluid and smoke is a long standing problem in computer graphics, where state-of-the-art approaches require large compute resources, making real-time applications often impractical. In this work, we propose a data-driven approach that leverages the approximation power of deep-learning methods with the precision of standard fluid solvers to obtain both fast and highly real...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009